46 research outputs found

    OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

    Full text link
    Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat

    Porous single crystalline-like titanium dioxide monolith with enhanced photoelectrochemical performance

    Get PDF
    Macro-sized porous single crystalline-like (PSC-like) TiO2 is endowed with unique structural advantages due to its structural consistency and porosity in a large area, which would significantly enhance its photoelectrochemical function. However, there are significant technical challenges in the growth of porous single crystalline-like monoliths. The consistency of structure dominates the structure so that the grain boundary is reduced to the minimum, which is in contradiction with the three-dimensional percolation structure. Here we report a lattice reconstruction strategy based on solid-solid transformation to grow porous single crystal-like anatase TiO2 dominated by (200) and (101) facets at 2 cm scale. In comparison with the traditional definition of porous single crystal, it has two different lattice orientations, but still has good photoelectrochemical properties. The band gap engineering introduces Ti3+ gap into the lattice to generate TinO2n−1 with Magneli phase, limiting the created active structure to the lattice with two-dimensional surface, which would open a new avenue to create highly active surfaces to capture photons and transport electrons stably. The PSC-like TinO2n−1 provides enhanced exciton lifetime (3–5 ns) as a photocatalytic catalyst and shows significant visible light absorption. The independent PSC-like TinO2n−1 delivers high photocurrent of 1.8–5.5 mA · cm−2 at room temperature and does not decay for 10 h

    A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms

    Get PDF
    We describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms ( SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds ( a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl. Subsequent experiments indicate that at least 90% of the variant sites are true SNPs, and at least 70% are common SNPs that segregate in many domestic breeds. Mean nucleotide diversity is about five SNPs per kilobase for almost every possible comparison between red jungle fowl and domestic lines, between two different domestic lines, and within domestic lines - in contrast to the notion that domestic animals are highly inbred relative to their wild ancestors. In fact, most of the SNPs originated before domestication, and there is little evidence of selective sweeps for adaptive alleles on length scales greater than 100 kilobases
    corecore